CO 

s 

w 

H 

HH 

Ph 

o 

o 

◄ 

Ei 

H 

Ph 

Ph 

£> 

o 

£ 

O 

U 


GO 

HH 

CO 



◄ 

£ 


H 

Ph 


H 

H 

i— i 

oo 

£ 

H 

P4 

Eh 

Ph 

O 

Ph 


?f-3f 

N 8 9- 29 808 ! - 

».r// . '76 




bO 

d CS 


>* bi 


a 


Xfl 

% 

.£ 

d 

P 

a 

8 


S c* 

§ o 

• rH t . 

bO 

d W 

fVl 


O 

fl 

O 

• rH 

CO 

• rH 

t> 

• rH 


o 

CD 

• rH 

§ 

u 

Oh 


T3 

• rH 

s 



£; 

PQ 


'S 

CQ 

*13 


rO 

h^J 

-3 


_ o 
<5 <5 co 

rH ^ ^-w 


<U 

<3 


cu Os 

*H 
« 3 o 

"W ^ o" 

C$ ^ 

w 13 ^ 
.S Ph O 

S r o ^ 

£ 8 ^ 

s ^ 

^ o 


1067 



‘CUT AND PASTE’ ALGORITHMS 


MODEL STRUCTURE 




THE STRUCTURE VIEWED AS A COLLECTION OF 
DISCONNECTED SUBSTRUCTURES 


1068 


>H 

hJ 

< 

2 

o 

M 

H 

2 

W 

H 

2 


< 

P3 

H 

In 

td 


W 

a 

< 

a. 

CO 

w 

EC 

H 


1069 



A ‘CUT AND PASTE’ ALGORITHM 


• Predictor phase: 

d n +i = d n + A tv n + (1/2 — /?)A t a n 
v n+ i = v n + (1 - 7) Ata n 

• Equation solving phase: 

&n+ 1 — ^ 

for 3 — 1, NS do 

K+x = — (M s + 
a n +l ^-n+l d - hd ^-n +1 
a n +i M a n+ i 

• Corrector phase: 

d n+ i = d n+ i + f 3 At a n +i 

v n+ i = v n +i + 7 Ata n +i 


fMJ| ***** 
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INTERPROCESSOR COMMUNICATIONS 



REDUCED SUBSTRUCTURES SHOWING THE 
COMMUNICATION DUE TO SHARED DEGREES 
OF FREEDOM . 
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OVERVIEW OF GENERAL PROPERTIES 


• Parameters: 

n = Number of dof in structure. 
s = Number of element groups. 
p = Number of processors. 
i — Number of interface dof. 


• General properties: 

i) Newmark’s method is obtained for 5 = 1. 

ii) Unconditional stability for all 5 and 7 > 

C 1/2, p > 7/2. 

iii) Full concurrency on a p-processor machine 
(p < n) except for 0(i) operation (mass- 
averaging). 

iv) For given accuracy and njs 00 , 


SPEED -UP = 


0(p>/s), ( 2 D) 
0(ps ), (3D) 


MOT FILMED 


toL. 
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General properties. Note two-parameter dependence of speed-up esti- 
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Figure 3. Discretization and partition of the bar problem. 


One-dimensional test for assessing communication efficiencies. 
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Figure 4. Discretization and partition of the plane stress problem on 
32-processor computer. 


Two-dimensional test for assessing communication efficiencies. 
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N 

Computation 
time (milisec.) 

Communication 
time (milisec.) 

efficiency 

% 

Rate 

Mflop 

64 

1.47 

0.46 

76.32 

0.233 

96 

2.20 

0.46 

82.86 

0.253 

160 

3.67 

0.46 

88.96 

0.271 

288 

6.61 

0.46 

93.55 

0.285 

544 

12.49 

0.46 

96.48 

0.294 


Table 5. Performance of the Bar problem on the 32 Processor Hypercube. 


No. of 
Elements 

Computation 
time (milisec.) 

Communication 
time (milisec.) 

efficiency 

% 

Rate 

Mflop 

128 

25.2 

7.8 

76.3 

0.232 

288 

57.1 

8.6 

87.0 

0.265 

800 

161.3 

10.6 

93.9 

0.286 

2592 

530.9 

14.6 

i 97.3 

0.297 

9248 

i 

1915.2 

! 

22.6 

i 

i 

98.8 

1 

! 

0.301 


















Communication efficiencies for 1-D and 2-D test cases. Note efficiencies 


o 

bO 

a 



& 


1081 


C COMPUTATIONAL EFFICIENCY 

• COST « \nb 2 + 2 nb , (6 = semi-bandwidth) 

• Square mesh, l 2 elements: 


GLOBAL ~ + 2) 2 (Z + l) 2 + 2(1 + 2 )(Z + l) 2 

• Partitioned mesh, s = m 2 subdomains: 


c 




PARTITIONED 
m J \m 


l 


+ 1 



• Equation solving speed-up (n/s — > oo): 


SPEED - UP(2D) = 
SPEED - UP(3D) = 


GLOBAL 

TartitJoWed 

GLOBAL 

paMttIoned 


0 ( 8 ) 

0(s 4/3 ) 
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2D CASE (1024 ELEMENTS) 


C 



dn-CD3dS 
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0.0 200 • 400 600 800 1000 1200 

NUMBER OF SUBDOMAINS 





Estimated equation solution speed-up for one application of the algo- 
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ACCURACY ANALYSIS 
• Algorithmic phase errors, ID case: 

C 



• Maximum celerity of computed waves 



= AL/At 


A L = subdomain size 

• For accurate results, need to take 



Cmax > c > OR At < AL/c 
wave celerity 
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Accuracy requirements derived from an analysis of phase errors in one 
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C ACCURACY REQUIREMENTS 

• Square mesh, s = m 2 subdomains: 

A L = L/mtt 0(1/Vs) 

At < AL/c - L/mc « 0(1/ \fs) 

• Net speed-up (p = 1): 

SPEED - UP(2D) « 0(s ) x 0(1 / a/s) = 0(/i) 

• Cubic mesh, s = m 3 subdomains: 

c 

A£ = L/m « Oil/s 1 ^) 

At < AL/c = L/mc « 0(l/s 1,/3 ) 

• Net speed-up (p = 1): 

SPEED - UP(3D) « 0(s 4/3 ; ) x 0(l/s 1/3 ) = 0(a) 


I 
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Estimated equation solution speed-ups for a square mesh in large scale 
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NUMERICAL TESTS 


Square membrane, simply supported, subjected 
to uniform initial velocity. 

Finite deflection FE formulation. Triangular el- 
ements: 


T A 2 


2 A 0 


T = tension 

Ao = initial area of triangle 
A = deformed area. 


Quadrilateral elements: 


Parameters: L = 2, T = 1, p = 1. 
Error measure: 


ERROR — 


X 


dt 

Wexact{t) | ^2* 


1/2 
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1 SUBDOMAIN, h = 0.05 







UNIFORM IMPACT ((3 = 0.25, 7 = 0.5) 


( 



UNIFORM IMPACT (0 = 0.25, 7 = 0.5) 
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ACCURACY REQUIREMENTS 


C 
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NUMBER OF SUBDOMAINS 




Actual vs. estimated time step requirements as a function of number of 
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1024 ELEMENT CASE 
NSUB | Secs. Speed-up Theory 


1 j 

1143 

1 ' 

4 

776 

1.47 
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3.51 
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Actual vs. estimated speed-ups for square membrane problem on a 
single processor. Timings correspond to the equation-solving phase only. 
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